arXiv:1509.00824vl [math.OC] 2 Sep 2015 


A NOTE ON PROBABLY CERTIFIABLY CORRECT ALGORITHMS 


AFONSO S. BANDEIRA 


Abstract. Many optimization problems of interest are known to be intractable, and while there are often 
heuristics that are known to work on typical instances, it is usually not easy to determine a posteriori whether 
the optimal solution was found. In this short note, we discuss algorithms that not only solve the problem on 
typical instances, but also provide a posteriori certificates of optimality, probably certifiably correct (PCC) 
algorithms. As an illustrative example, we present a fast PCC algorithm for minimum bisection under the 
stochastic block model and briefiy discuss other examples. 


1. Introduction 

Estimation problems in many areas are often formulated as optimization problems, maximum likelihood 
estimation being a prime example. Unfortunately, these optimization problems are, in many relevant in¬ 
stances, believed to be computationally intractable. 

To circumvent the computational intractability of these problems, an impressive body of work is dedicated 
to propose and understand algorithms that are guaranteed to produce a solution with certain approximation 
guarantees [43]. Since these guarantees have to hold for worst-case inputs, they tend to be pessimistic and, 
fortunately, many real-world scenarios do not resemble these worst-case instances. 

In line with the “real-world data is not your enemy” paradigm, there is a line of work that attempts 
to propose and understand algorithms that work on only some sets of, hopefully typical, inputs. A prime 
example is sparse recovery, where the popular Compressed Sensing papers of Candes, Romberg, Tao, and 
Donoho [15, 20[ established that, while hnding a sparse solution to an underdetermined system is compu¬ 
tationally intractable in the worst-case, a simple efficient algorithm succeeds with high probability in many 
natural probabilistic models of instances. Important examples also include planted bisection [30, 21 [ and 
matrix completion [14[, among many others. 

Definition 1.1 (Probabilistic Algorithm). Given an optimization problem that depends on an input and a 
probability distribution D over the inputs, we say an algorithm is a probabilistic algorithm for this problem 
and distribution of instances if it finds an optimal solution with high probability^, with respect to D. 

Although these guarantees make excellent arguments towards the efficacy of certain probabilistic algo¬ 
rithms, they have the drawback that, oftentimes, it is not easy to check a posteriori whether the solution 
computed is the optimal one. Even in the idealized scenario were the input really follows the distribution in 
the guarantee there is a nonzero probability of the algorithm not producing the optimal solution, and it is 
often not possible to tell whether it did or not. To make matters worse, in practice, the exact distribution 
of instances rarely exactly matches the idealized one in the guarantees. 

The situation is different for a certain class of algorithms, convex relaxation based ones [17]. Some of these 
methods work by enlarging the feasibility set of the optimization problem to a convex set where optimizing 
the objective function becomes tractable. While the optimal solution is not guaranteed to be in the original 
feasibility set, there are many examples for which rounding procedures are known to produce solutions with 
approximation guarantees (one such example being the Goemans-Wihiamson [22] approximation algorithm 
for Max-Cut). On the other hand, if the solution happens to he on the original feasibility set, then one is 
sure that it must be the optimal solution of the original problem (providing, also, an a posteriori certihcate). 
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^We say that an event happens with high probability if its probability tends to 1 as the underlying parameters tend to 
infinity. 
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Fortunately, this tends to be the case for many examples of problems and relaxations [ 6 , 7, 9, 12, 16]. This 
motivates the following definition. 

Definition 1.2 (Probably Certifiably Correct (PCC) Algorithm). Given an optimization problem that de¬ 
pends on an input and a probability distribution D over the inputs, we say an algorithm is a Probably 
Certifiably Correct (PCC) algorithm for this problem and distribution of instances if, with high probability 
(w.r.t. D), it finds an optimal solution and certifies to have done so. Moreover, it never incorrectly certifies 
a non-optimal solution. 

A PCC algorithm has the advantage of being able to produce a posteriori certificates. In particular, this 
renders them more appealing to be used in examples where the distribution of problem instances may not 
coincide with the idealized ones in proved guarantees. Indeed, duality is very often used in practice to provide 
a posteriori certificates of quality of a solution to an optimization problem (see [18] for a particularly recent 
example in the problem of Simultaneous Localization and Mapping (SLAM)). While this is a great argument 
towards the use of convex relaxation-based approaches,^ the convex problems that many such algorithms 
are relaxed to are semidefinite programs (SDP) which, while solvable (to arbitrary precision) in polynomial 
time ]41], tend to be computationally costly.^ 

Many probabilistic algorithms not based on convex relaxations (such as, for example, spectral meth¬ 
ods [30], message-passing type algorithms [31], alternating-minimization/expectation-maximization tech¬ 
niques ]27]) often do not enjoy a posteriori guarantees, but tend to be considerably more efficient than the 
convex relaxation-based competitors. This motivates the natural question of whether it is possible to devise 
a posteriori certifiers for candidate solutions produced by these or other methods. 

Definition 1.3 (Probabilistic Certifier). Given an optimization problem that depends on an input, a prob¬ 
ability distribution D over the inputs, and a candidate solution for it, we call a Probabilistic Certifier, a 
method that: 

• With high probability (w.r.t. D), if the candidate solution is an optimal one it outputs: The solution 
is optimal.^ It may, with vanishing probability,^ output: Not sure whether the solution is 
optimal. 

• If the candidate solution is not an optimal solution, it always outputs: Not sure whether the 
solution is optimal. 

A particularly natural way of constructing such certifiers is to rely on convex relaxation-based PCC 
algorithms; given a candidate solution computed by a probabilistic algorithm, one can check whether it 
is an optimal solution to a certain convex relaxation. Remarkably, it is sometimes considerably faster to 
check wether a candidate solution is an optimal solution of a convex program than to solve the program; in 
many such cases, one can devise faster PCC algorithms by combining fast probabilistic algorithms with fast 
methods to certify that a candidate solution is an optimal solution to a convex relaxation, or even that it is 
the unique optimal solution (as it will be the case with Algorithm 2.2). In the next section, we use the the 
problem of minimum bisection under the stochastic block model to illustrate these ideas. 

2. A FAST PCC ALGORITHM FOR RECOVERY IN THE STOCHASTIC BLOCK MODEL 

The problem of minimum bisection on a graph is a particularly simple instance of community detection 
that is known to be NP-hard. Recently, there has been interest in understanding the performance of several 
heuristics in typical realizations of a certain random graph model that exhibits community structure, the 
stochastic block model: given n even and 0 < 9 < p < 1, we say that a random graph G is drawn from 
Q{n,p,q), the Stochastic Block Model with two communities, if G has n nodes, divided in two clusters of ^ 


^Note that there is another type of convex relaxation-based algorithms, such as in sparse recovery, where instead of enlarging 
the feasibility set, one replaces the objective function by a complex surrogate. Unfortunately, in that case, it appears to be 
more difficult to certify, a posteriori, the optimality of the solutions. For the particular case of sparse recovery, we refer the 
reader to [:I6] for a discussion on certificates of optimality. 

^It is also fairly common for problems to be relaxed to linear programs, which tend to be computationally cheaper. 

Rn some cases, certifiers may also certify that a solution is not only optimal, but the unique optimal solution, as it will be 
the case with Algorithm 2.2. 

*^By vanishing probability we mean probability tending to 0 as the underlying parameters tend to infinity. 
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nodes each, and for each pair of vertices (i,j) is an edge of G with probability p if j and j are in the 
same cluster and with probability q otherwise, independently from any other edge. 

Let A denote the adjacency matrix of G. We define the signed adjacency matrix B as B = 2A — 
(1—/). To each partitioning of the nodes we associate a vector x with ±1 entries corresponding to cluster 
memberships. The minimum bisection of G can be written as 

max Bx 

s.l. .| = 1, V. 

= 0 . 

Setting p = and q — it is known [2, 32] that the hidden partition, with high probability, 

coincides with the minimum bisection, and can be computed efficiently provided that 

v/^-V^>y 2 . ( 2 ) 

On the other hand, if \fa — < \/2, then, with high probability, the maximum likelihood estimator (which 

corresponds to the minimum bisection) does not coincide with the hidden partition. 

Remarkably, for the stochastic block model on two communities with parameters in the regime (2), 
there are quasi-linear time algorithms known to be probabilistic algorithms for minimum bisection (see, for 
example, [3]). A convex relaxation, proposed in [2], was also recently shown to exactly compute the minimum 
bisection in the same regime [ 8 , 23]. 

The convex relaxation in [2, 8 , 23] is obtained by writing (1) in terms of a new variable X = xx"^. More 
precisely, ( 1 ) is equivalent to 

max Tr (BX) 

X 

S.t. Xii — 1, Vi 

A ^ 0 (3) 

rank(A) = 1 
Tr {X 11^) = 0. 

The semidefinite programming relaxation considered is obtained by removing the last two constraints. ® 

max Tr (BX) 

s.t. Xu = 1, V, (4) 

A ^ 0. 

The argument in [ 8 , 23] makes use of duality [42]. Since (4) satisfies Slater’s condition,^ the optimal value 
of dual program given by 

min Tr(Z?) 

D 

s.t. D- BhO (5) 

D is diagonal 

is known to match the optimal value of (4). More precisely, given the hidden partition x\^ G {±1}", Abbe 
et al. [ 2 ] propose := Z?diag(a:i,)Bdiag(a:i,) a candidate solution for the dual,® where diag(a;[]) is a diagonal 
matrix whose diagonal is given by X'^ and D\^ = Zddiag(a:i,)Bdiag(a;i,) is a diagonal matrix whose diagonal is 
given by 

n n 

~ [-^diag(a:h)B diag(3:[,)] ^ ^ )-^ diag(j:t] )]^^- = {x\^) ^ Bjj . 


®(4) would still be a semidefinite program if the constraint Tr (X 11^) = 0 was kept, but it turns out that it is not needed 
and removing it renders the analysis slightly simpler. 

^Slater’s condition is a technical condition that ensures strong duality and tends to be satisfied in many relevant problems; 
in this case it asks that there is a feasible point X that has no zero eigenvalues and the identity matrix serves as an example, 
see [42] for more details. 

^Interestingly, in this case, the equality constraints and complementary slackness conditions are enough to pin-point a single 
possible candidate for a dual certificate, and only the semidefinite constraint needs to be checked; this is the case for semidefinite 
programs satisfying certain properties, see [o] for more details. 
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More recently, [8, 23] showed that, in the parameter regime given by (2) and with high probability, this dual 
candidate solution is indeed a feasible solution to (5) whose value matches the value of the x^Bx[^. This 
implies that x^^x'^ is an optimal solution of (4). Moreover, since [8, 23] show that 

D\f — B '^0 and X 2 {D\f — B) > 0, (6) 

where A 2 denotes the second smallest eigenvalue, the argument can be easily strengthened (using comple¬ 
mentary slackness) to show that x\fx'^ is the unique optimal solution (see [8, 23] for details). 

A particularly enlightening way of showing that (6) indeed certifies that the partitioning given by X[^ is 
the unique solution to the minimum bisection problem is to note that, for any other candidate bisection 
X e {±1}”, 

n 

Bx^ - x^Bx = ]Z3^ -Bf x + Y^ (l - xj) 

= x'^[D^-B]x. (7) 

Since D\^ — B \s known to satisfy (6), then x^ \D\^ — B]x > 0. Moreover, since [£)(, — B]x\^ = 0, if x 
corresponds to another bisection (meaning that x ^ x^ and x —xt]) then x^ \D\^ — i3] x > 0, implying that 
x'^Bx\^ — x^Bx > 0. 

Remark 2.1. A particularly fruitful interpretation of (7) is to think of it as a sum-of-squares certificate. 
More precisely, since D\f — B is positive semidefinite it has a Cholesky decomposition — B = VV'^ which 
means that, for x G {±1}", 

n ( 

x'^Bx\f — x^Bx = x'^VV'^x = ||y^x||^ = ( Yh 

j=i \i=i 

By writing x^ Bx\^ — x^Bx has a sum of squares, we certify that x^Bxij is an optimal solution. It turns 
out that certificates of this type always exist, potentially having to include polynomials of larger degree, 
and that, with a fixed bound on the degree of the polynomials involved, these certificates can be found with 
semidefinite programming whose complexity depends on the degree bound. This is a simple instance of the 
sum-of-squares technique (based on Stengle’s Positivstellensatz [40] j proposed independently in a few different 
areas [37, 33, 29, 34[ and now popular in theoretical computer science [13[. 

This suggests the following PCC algorithm for minimum bisection in the Stochastic Block Model. 

Algorithm 2.2. Given a graph G, use the quasi-linear time algorithm in [3[ to produce a candidate biseetion 
X*. If xfl = 0 and 

^2 (-^diag(a:*)i5 diag(a:*) -^) ^ 0; (8) 

output: 

• X* is the minimum bisection. 

If not, output: 

• Not sure whether x» is the minimum bisection. 

Note that, since (B(jiag(a;^)sdiag(a;,) — B) X* = 0, (8) automatically implies condition (6). 

The following follows immediately from the results in [3[ and [8, 23]. 

Proposition 2.3. Algorithm 2.2 is a Probably Certifiably Correct Algorithm for minimum bisection under 
the stochastie block model in the regime of parameters given by (2) . 

2.1. Randomized certificates. While Algorithm 2.2 is considerably faster than solving the semidefinite 
program (4) it still requires one to check that an n x n matrix has a positive second-smallest eigenvalue, 
which we do not know how to do in quasi-linear time. A potentially faster alternative would be to use a 
randomized power-method-like algorithm, such as randomized Lanczos method [28], to estimate the second 
smallest eigenvalue of — B. Note that since B = 2A — (11^—/), where A is a sparse 

matrix, matrix-vector multiplies with B(jiag(a;^)Bdiag(a;,) ~ B can be computed in quasi-linear time. While 
the use of such randomized methods would not provide a probabilistic certificate, it could potentially provide 
a randomized certificate that has a small probability (with respect to a source of randomness independent 
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to D) of “certifying” an incorrect solution. However, since it would rely on independent randomness, the 
process would be able to be repeated to achieve an arbitrarily small probability of providing false certihcates. 

In fact, we believe that the analysis of the typical behavior of the second eigenvalue of i^diag(a:,)B diag( 2 :,) 
in [8, 23] and the guarantees for the performance of randomized Lanczos method [28] can be used to devise 
a quasi-linear time randomized procedure that can serve as a randomized certificate for minimum bisection 
in the stochastic block model, as described above. However, such construction falls outside of the scope of 
this short note, and so it is left for future research. 

3. Other examples of PCC algorithms and future directions 

One of the most appealing characteristics of Algorithm 2.2 above is that it can be easily generalized to 
many other settings. In fact, given an optimization problem and a distribution D over the instances, if there 
is a fast probabilistic algorithm and a convex relaxation that is known to be tight (meaning that its optimal 
solution is feasible in the original problem), one can make use of both algorithms, similarly to Algorithm 2.2, 
and devise a fast PCC algorithm: by first running the fast probabilistic algorithm and then checking whether 
the candidate solution is the optimal solution to the convex relaxation. Unfortunately, it is not clear, in 
general, whether one can check optimality in the convex relaxation considerably faster than simply solving 
it. On the other hand, many proofs of tightness of convex relaxations also provide a candidate dual solution 
and, in many instances, checking whether this dual solution is indeed a dual certificate is significantly faster. 

For some problems, such as multisection in the stochastic block model with multiple communities, both fast 
probabilistic algorithms [3] and tightness guarantees for convex relaxations have already been established [24, 
4, 35[ suggesting that this framework could be easily applied there. Other problems for which convex 
relaxations are known to be tight include: Synchronization over Z 2 [1[ and SO(2) [9[, sparse PGA [6[, 
k-medians and k-nieans clustering [7, 26[, multiple-input multiple-output (MIMO) channel detection [38[, 
sensor network localization [39], shape matching [19], and many others. There are several others where 
convex relaxations are conjectured to be tight under appropriate conditions, such as the non-negative PCA 
problem [31], the multireference alignment problem [10, 11[, and the synchronization problem in SLAM [18[. 
For the case of non-negative PCA, there is a known probabilistic algorithm [31 [. We suspect that this 
framework may be useful in devising fast PCC algorithms for a large class of problems, perhaps including 
many of the described above.® Moreover, even when probabilistic algorithms are not available, fast certifiers 
may be useful to test the performance of heuristics in real world problems. 

Acknowledgements. The author thanks Dustin G. Mixon, Soledad Villar, Nicolas Boumal, and Amit 
Singer for interesting discussions and valuable comments on an earlier version of this manuscript. The 
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